{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LAB 03.01 - Model Generation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!wget --no-cache -O init.py -q https://raw.githubusercontent.com/fagonzalezo/ai4eng-unal/main/content/init.py\n", "import init; init.init(force_download=False); init.get_weblink()\n", "init.endpoint" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from local.lib.rlxmoocapi import submit, session\n", "session.LoginSequence(endpoint=init.endpoint, course_id=init.course_id, lab_id=\"L03.01\", varname=\"student\");" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import make_moons\n", "from local.lib import mlutils\n", "from IPython.display import Image\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A machine learning task\n", "\n", "We have two species of bugs (**X bugs** and **Z bugs**), for each bug we have measured its **width** and **length**. Once we have a bug, determining if is of **species X** or **species Z** is very costly (lab analysis, etc.)\n", "\n", "**Machine learning goal**: We want to create a model so that, when given the width and length of a bug, will tell us whether it belongs to **species X** or **species Z**. If the model performs well, we might use it insted of the lab analysis.\n", "\n", "**To train a machine learning model** we built a **training dataset** where we have **annotated** 20 bugs with their **confirmed** species. The training dataset has:\n", "\n", "- 20 data items\n", "- two data columns (**width** and **length**)\n", "- one label column, with two unique values: **0 for species X**, and **1 for species Z**.\n" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(20, 3) (20, 2) (20,)\n", "[[0.5 0.65]\n", " [0.75 0.34]\n", " [0.37 0.5 ]\n", " [0.57 0.74]\n", " [1. 0.69]]\n", "[0. 1. 1. 0. 1.]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
widthheighty
00.500.650.0
10.750.341.0
20.370.501.0
30.570.740.0
41.000.691.0
\n", "
" ], "text/plain": [ " width height y\n", "0 0.50 0.65 0.0\n", "1 0.75 0.34 1.0\n", "2 0.37 0.50 1.0\n", "3 0.57 0.74 0.0\n", "4 1.00 0.69 1.0" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "d = pd.read_csv(\"local/data/trilotropicos_small.csv\")\n", "X,y = d.values[:,:2], d.values[:,-1]\n", "print (d.shape, X.shape, y.shape)\n", "print (X[:5])\n", "print (y[:5])\n", "d.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since it is just two columns, we can visualize it" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAcVklEQVR4nO3df5DV9X3v8ed7+SHFRVRo99Is+6O3JNVYjSxX0wFvIMYGmbkwpOiQrFa96iY4oGPUe3FoNNI6d5qmwYkxNYtxomHrSrRa2mDRmt0YKmT4oaUBg0OQXXdMy88qK1mE7vv+8T0Lh+Xs7nf3nM/59X09Zs6c8/1+P3z3/eHsnvf5/Ph+vubuiIhIclUUOgARESksJQIRkYRTIhARSTglAhGRhFMiEBFJuNGFDmC4Jk+e7HV1dVmf58MPP+Tcc8/NPqASofqWN9W3vOWivtu2bTvo7r+d6VjJJYK6ujq2bt2a9Xna29uZPXt29gGVCNW3vKm+5S0X9TWzjoGOqWtIRCThlAhERBJOiUBEJOFKboxARGQwJ06coKuri56enkKHkjMTJ07krbfeilV23LhxVFdXM2bMmNjnVyIQkbLS1dXFhAkTqKurw8wKHU5OHD16lAkTJgxZzt05dOgQXV1d1NfXxz6/uoZEpKz09PQwadKkskkCw2FmTJo0aditISUCESk7SUwCfUZSdyUCEZGEC5YIzOxJM9tvZr8Y4LiZ2bfNbI+Z7TCz6aFikeLX0gJ1dVBRET23tBQ6IpGReffdd6mvr+fw4cMAHDlyhPr6ejo6zryea9++fVxyySWFCPEsIVsEPwDmDnL8WmBa6tEE/E3AWKSItbRAUxN0dIB79NzUpGQgpWnq1KksWbKE5cuXA7B8+XKampqora0tcGQDC5YI3P014PAgRRYAT3tkM3C+mU0JFY8UrxUr4NixM/cdOxbtFwktRGv07rvvZvPmzTzyyCNs3LiRe+65J2O5kydPctNNN3HppZeyaNEijqX+EOrq6jh48CAAW7duZd68eQAcOHCAa665hunTp/PlL3+Z2traU+WyUcjpox8D3k3b7krt+3X/gmbWRNRqoKqqivb29qx/eHd3d07OUyqKub7Llg18bKQhF3N9Q1B9T5s4cSJHjx6NdZ61a0ezbNk4fvObaIC1owNuv93p6enh+utPZhXjQw89xBe+8AVefPFFjh8/zvHjx8+qw+7du3n00Uf5zne+wx133MGqVau48847cXe6u7s555xz+PDDD3F3jh49yooVK5g5cyb33HMPr7zyCs3NzafKpevp6Rne74O7B3sAdcAvBjj2Y2BW2varQMNQ52xoaPBcaGtry8l5SkUx17e21j3qFDrzUVs78nMWc31DUH1P27VrV+zzhPjd63PXXXf5lClT/Fvf+lbG4++8845PnTr11Parr77qCxYsSMVV6wcOHHB39y1btvisWbPc3f2yyy7zvXv3nvo3F1xwwaly6TL9HwBbfYDP1ULOGuoCpqZtVwPvFSgWKaCHH4bx48/cN358tF8kpM7O4e2P68033+SVV15h8+bNrFq1il//+qyODuDsqZ5926NHj6a3txfgjGsCos/z3CtkIlgH/Glq9tCngffdPfP/lpS1xkZobobaWjCLnpubo/2FollMyVBTM7z9cbg7S5Ys4ZFHHqGmpob77ruPe++9N2PZzs5ONm3aBMAzzzzDrFmzgGiMYNu2bQA8//zzp8rPmjWLtWvXAvDyyy9z5MiRkQeaJuT00WeATcAnzKzLzG41s6+Y2VdSRdYDe4E9wGrgjlCxSPFrbIR9+6C3N3oudBLQLKZkCNEaXb16NTU1NVxzzTUA3HHHHfzyl7/kpz/96VllL7roIp566ikuvfRSDh8+zJIlSwB48MEHueuuu7jqqqsYNWrUqfIPPvggL7/8MtOnT+ell15iypQpsZaeGNJAfUbF+tAYwWlr1kR9mWbR85o1A5cth/oORzb1DdlvHIre39OGM0bgPry/o0L54IMP3N29p6fHT5w44e7ur7/+ul922WUZyw93jECLzpWovm+tfdMu+761QmG/TZeDUP3GUpwaG0vnb6azs5Prr7+e3t5exo4dy+rVq3NyXiWCEjXY3PtS+aUuVjU1UWLNtF+kkKZNm8Ybb7yR8/NqraESpW+t4WgWkySNEkGJCjHbQSLFOItJJCQlghKlb61hFdMsJpHQlAhKlL61ikiuKBGUMH1rFSk+L7zwAp/61KfOeFRUVPDSSy+dUa6YlqHWrCERkRxauHAhCxcuPLXd3NxMS0sLn//85wsY1eDUIhCRZAu4nsjbb7/NypUr+eEPf0hFxdkft8WyDLUSgYgkV8D1RE6cOMGXvvQlvvnNb1IzwHS+3bt309TUxI4dOzjvvPP47ne/O+g5H3roIT772c+yfft2Fi5cSGeO5osrEYhIcgW8K9LXvvY1PvnJT7J48eIBy0ydOpWZM2cCcMMNN7Bx48ZBz7lx48ZT55s7dy4XXHBB1nGCxghEJMkCXZnZ3t7O888/z/bt2wctp2WoRUQKLcCVmUeOHOGWW27h6aefHnJl0LJfhlpEpOgFuDLz8ccfZ//+/SxZsuSMKaTPPvvsWWWLZRlqdQ2JSHL1XXyzYkXUHVRTEyWBLC7Kuf/++7n//vuHLFdXV8euXbsyHrvqqqt4++23T2333YN54sSJbNiwgdGjR7Np0yba2trOul/xSCgRiEiyldA61FqGWkQk4bQMtYhITKFm15SCkdRdiUBEysq4ceM4dOhQIpOBu3Po0CHGjRs3rH+nriERKSvV1dV0dXVx4MCBQoeSMz09PbE/3MeNG0d1dfWwzq9EICJlZcyYMdTX1xc6jJxqb2/n8ssvD3Z+dQ2JiCScEoGISMIpEYiIJJwSgcQScMl2ESkwDRbLkPqWbO9brbdvyXYomQsyRWQQahHIkAIu2S4iRUCJQIYUaMl2ESkSSgQypABLtotIEQmaCMxsrpntNrM9ZrY8w/EaM2szszfMbIeZzQsZj4xMgCXbRaSIBEsEZjYKeAy4FrgY+KKZXdyv2J8Ba939cmAxMPidm6UgGhuhuRlqa8Esem5u1kCxSLkIOWvoCmCPu+8FMLNWYAGQficGB85LvZ4IvBcwHslCCS3ZLiLDFDIRfAx4N227C7iyX5mvAy+b2TLgXOBzAeMREZEMLNRSrWZ2HfB5d78ttX0jcIW7L0sr89VUDH9tZn8EfB+4xN17+52rCWgCqKqqamhtbc06vu7ubiorK7M+T6lQfcub6lveclHfOXPmbHP3GZmOhWwRdAFT07arObvr51ZgLoC7bzKzccBkYH96IXdvBpoBZsyY4bNnz846uPb2dnJxnlKh+pY31be8ha5vyFlDW4BpZlZvZmOJBoPX9SvTCVwNYGYXAeOA8llEPMm0JoVIyQjWInD3k2a2FNgAjAKedPedZrYS2Oru64B7gNVmdjfRwPHNnsTbCpUbrUkhUlKCrjXk7uuB9f32PZD2ehcwM2QMUgCDrUmhRCBSdHRlseSe1qQQKSlKBJJ7WpNCpKQoEUjuaU0KkZKiRCC5pzUpckITryRfdGMaCUNrUmRFE68kn9QiEClCuhmQ5JMSgUgR0sQryafEJYK+ftdt29TvKsVLE68knxKVCPr6XTs6ou2+flclAyk2mngl+ZSoRKB+VykVmngl+ZSoWUPqd5VSoolXki+JahGo31VE5GyJSgTqdxUROVuiEkF6vyuo31VEilyepjkmaowATve7trfDvn2FjkZEZAB5vLw8US0CEZGSkcdpjkoEIiLFKI/THJUIRESKUR6nOSoRiIgUozxOc1QiEBEpRnmc5qhEIJIDuomMBNHYGE1vbGiIngPNdU/c9FGRXNNNZKTUqUUgkiUtZiilTolAJEtazFBKnRKBSJa0mKGUOiUCkSxpMUMpdUoEIlnSTWSk1GnWkEgO6CYyUsrUIsgTzTMXkWKlFkEeaJ65iBSzoC0CM5trZrvNbI+ZLR+gzPVmtsvMdprZ34aMp1A0z1xEilmwFoGZjQIeA64BuoAtZrbO3XellZkG3A/MdPcjZvY7oeIpJM0zF5FiFrJFcAWwx933uvtHQCuwoF+Z24HH3P0IgLvvDxhPwWieuYgUM3P3MCc2WwTMdffbUts3Ale6+9K0Mi8CbwMzgVHA1939nzKcqwloAqiqqmpobW3NOr7u7m4qKyuzPk8chw9H4wK9vaf3VVRE0wwvvDAvIeS1vsVA9S1vqu/wzZkzZ5u7z8h40N2DPIDrgCfStm8EHu1X5h+BF4AxQD1RF9L5g523oaHBc6GtrS0n54lrzRr32lp3s+h5zZq8/vi817fQVN/ypvoOH7DVB/hcDTlrqAuYmrZdDbyXocxmdz8BvGNmu4FpwJaAcRWE5pmLSLEKOUawBZhmZvVmNhZYDKzrV+ZFYA6AmU0GPg7sDRiTiIj0EywRuPtJYCmwAXgLWOvuO81spZnNTxXbABwys11AG3Cfux8KFZOIiJwt6AVl7r4eWN9v3wNprx34auohIiIFEDsRpK4LqEr/N+6umfAiIiUuViIws2XAg8B/AH2TIB24NFBcIiKSJ3FbBHcBn1D/vYhI+Yk7WPwu8H7IQEREpDAGbRGYWd8g7l6g3cx+DBzvO+7u3woYm4iI5MFQXUMTUs+dqcfY1AOiMQIRESlxgyYCd38IwMyuc/cfpR8zs+tCBiYiIvkRd4zg/pj7RESkxAw1RnAtMA/4mJl9O+3QecDJkIGJiEh+DDVG8B6wFZgPbEvbfxS4O1RQIiKSP4N2Dbn7v7r7U8Dvu/tTaY+/89TNZETypqUF6uqimznU1UXbIpK1uBeUbTez/rOE3idqLfyFLjST4FpaoKnp9M2fOzqibdD63iJZijtY/BLwY6Ax9fgH4GfAvwM/CBKZSLoVK04ngT7HjkX7RSQrcVsEM919Ztr2v5nZv7j7TDO7IURgImfoHGB9w4H2i0hscVsElWZ2Zd+GmV0B9N1AU7OHJLyamuHtF5HY4iaC24AnzOwdM9sHPAHcbmbnAv8vVHAipzz8MIwff+a+8eOj/SKSlVhdQ+6+BfhDM5sImLv/Z9rhtUEiE0nXNyC8YkXUHVRTEyUBDRSLZC3u/QjOAf4EqANGmxkA7r4yWGQi/TU26oNfJIC4g8V/TzRddBtpq4+KiEjpi5sIqt19btBIRESkIOIOFr9uZn8YNBIR0cXTUhBxWwSzgJvN7B2iriED3N11z2KRHNHF01IocRPBtUGjEJFBL55WIpCQYnUNuXsHMBX4bOr1sbj/VkTi0cXTUiixPszN7EHg/3L6ZjRjgDWhghJJIl08LYUS91v9QqJ7EnwI4O7vcfp+xiKSA7p4WgolbiL4yN2d1A3rU0tLiEgONTZCczPU1oJZ9NzcrPEBCS/uYPFaM/secL6Z3Q78b2B1uLBEkkkXT0shxB0s/ibwHPA88AngAXd/NGRgIiI5pYs0BhS3RYC7vwK8EjAWEZEwdJHGoAZtEZjZUTP7IMPjqJl9MNTJzWyume02sz1mtnyQcovMzM1sxkgqISIyKN3hblCDtgjcfcQzg8xsFPAYcA3QBWwxs3XuvqtfuQnAncDPR/qzREQGpYs0BhXyorArgD3uvtfdPwJagQUZyv058A2gJ2AsIpJkukhjUBbNCg1wYrNFwFx3vy21fSNwpbsvTStzOfBn7v4nZtYO3OvuWzOcqwloAqiqqmpobW3NOr7u7m4qKyuHLlgmVN/ypvoO4fDhaFygt/f0voqKaI7uhRfmPsAcy8X7O2fOnG3unrn73d2DPIDrgCfStm8EHk3brgDagbrUdjswY6jzNjQ0eC60tbXl5DylQvUtb6pvDGvWuNfWuptFz2vW5DiqcHLx/gJbfYDP1dizhkagi2h9oj7VwHtp2xOAS4D21B3P/huwzszme4ZWgYhIVnSRxoBCjhFsAaaZWb2ZjQUWA+v6Drr7++4+2d3r3L0O2AwoCYiI5FmwRODuJ4GlwAbgLWCtu+80s5VmNj/Uz801XYMiIuUuZNcQ7r4eWN9v3wMDlJ0dMpaR0DUoIpIEuqfAIHQNiogkgRLBIHQNiogkgRLBIHQNikgAGngrOkoEg9CNQkRyrG/graMD3E8PvCkZFJQSwSB0oxCRHNPAW1EKOmuoHOgaFJEc0sBbUVKLQETyRwNvRUmJQETyRwNvRUmJQETyRwNvRUljBCKSXxp4KzpqEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJFzQRGBmc81st5ntMbPlGY5/1cx2mdkOM3vVzGpDxiMiImcLlgjMbBTwGHAtcDHwRTO7uF+xN4AZ7n4p8BzwjVDxiIhIZiFbBFcAe9x9r7t/BLQCC9ILuHubux9LbW4GqgPGI/nU0gJ1dVBRET23tBQ6IhEZgLl7mBObLQLmuvttqe0bgSvdfekA5b8D/Lu7/0WGY01AE0BVVVVDa2tr1vF1d3dTWVmZ9XlKRV7re/gwdHRAb+/pfRUVUFsLF16YlxD0/pY31Xf45syZs83dZ2Q6NjqrMw/OMuzLmHXM7AZgBvCZTMfdvRloBpgxY4bPnj076+Da29vJxXlKRV7rW1cXJYL+amth3768hKD3t7ypvrkVMhF0AVPTtquB9/oXMrPPASuAz7j78YDxSL50dg5vv4gUVMgxgi3ANDOrN7OxwGJgXXoBM7sc+B4w3933B4xF8qmmZnj7RaSggiUCdz8JLAU2AG8Ba919p5mtNLP5qWJ/BVQCPzKzN81s3QCnk1Ly8MMwfvyZ+8aPj/aLSNEJ2TWEu68H1vfb90Da68+F/PlSII2N0fOKFVF3UE1NlAT69otIUQmaCCTBGhv1wS9SIrTEhIhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEjCKRGIiCScEoGISMIpEYiIJJwSgYhIwikRiIgknBKBiEBLS3Sv6YqK6LmlpdARSR7pfgQiSdfSAk1NcOxYtN3REW2D7imREGoRiCTdihWnk0CfY8ei/ZIISgQiSdfZObz9UnaUCESSrqZmePul7CgRiCTdww/D+PFn7hs/PtoviaBEIJJ0jY3Q3Ay1tWAWPTc3a6A4QTRrSESiD3198CeWWgQiIgmnRCAiknBKBCIiCadEICKScEoEIiIJp0Qgkkn6ImyTJ0cPLcgmZUrTR0X6678I26FDp49pQTYpQ0FbBGY218x2m9keM1ue4fg5ZvZs6vjPzawuZDwisWRahC2dFmQ7k5awLnnBEoGZjQIeA64FLga+aGYX9yt2K3DE3X8fWAX8Zah4RGKLs9iaFmSL9LWeOjrA/XSLScmgpIRsEVwB7HH3ve7+EdAKLOhXZgHwVOr1c8DVZmYBYxIZWpzF1rQgW0RLWJcFc/cwJzZbBMx199tS2zcCV7r70rQyv0iV6Upt/ypV5mC/czUBTQBVVVUNra2tWcfX3d1NZWVl1ucpFarvMBw+HH2z7e3NfLyiIlqP58ILRx5gjhXs/d22beBjDQ3Bfqx+n4dvzpw529x9RsaD7h7kAVwHPJG2fSPwaL8yO4HqtO1fAZMGO29DQ4PnQltbW07OUypU32Fas8a9ttbdzH3SpOhhFu1bsyYHEeZWwd7f2lr3qFPozEdtbdAfq9/n4QO2+gCfqyG7hrqAqWnb1cB7A5Uxs9HAROBwwJhE4mlshH37olbBwYPRo7c32qfZQqdpCeuyEDIRbAGmmVm9mY0FFgPr+pVZB9yUer0I+Ekqc4lIKdAS1mUh2HUE7n7SzJYCG4BRwJPuvtPMVhI1UdYB3wd+aGZ7iFoCi0PFIyKBaAnrkhf0gjJ3Xw+s77fvgbTXPURjCSIiUiBaYkJEJOGUCEREEk6JQEQk4ZQIREQSTolARCThlAhERBIu2FpDoZjZAaAjB6eaDBwcslT5UH3Lm+pb3nJR31p3/+1MB0ouEeSKmW31gRZgKkOqb3lTfctb6Pqqa0hEJOGUCEREEi7JiaC50AHkmepb3lTf8ha0vokdIxARkUiSWwQiIoISgYhI4pV9IjCzuWa228z2mNnyDMfPMbNnU8d/bmZ1+Y8yd2LU96tmtsvMdpjZq2ZWW4g4c2Wo+qaVW2RmbmYlPeUwTn3N7PrUe7zTzP423zHmUozf5xozazOzN1K/0/MKEWcumNmTZrY/dS/3TMfNzL6d+r/YYWbTc/bDB7qHZTk8iG6I8yvg94CxwL8CF/crcwfweOr1YuDZQscduL5zgPGp10vKvb6pchOA14DNwIxCxx34/Z0GvAFckNr+nULHHbi+zcCS1OuLgX2FjjuL+v5PYDrwiwGOzwNeAgz4NPDzXP3scm8RXAHscfe97v4R0Aos6FdmAfBU6vVzwNVmZnmMMZeGrK+7t7n7sdTmZqJ7SZeqOO8vwJ8D3wB68hlcAHHqezvwmLsfAXD3/XmOMZfi1NeB81KvJ3L2fdFLhru/xuD3bF8APO2RzcD5ZjYlFz+73BPBx4B307a7UvsylnH3k8D7wKS8RJd7ceqb7laibxilasj6mtnlwFR3/8d8BhZInPf348DHzexfzGyzmc3NW3S5F6e+XwduMLMuorshLstPaAUx3L/v2ILeqrIIZPpm33++bJwypSJ2XczsBmAG8JmgEYU1aH3NrAJYBdycr4ACi/P+jibqHppN1Nr7mZld4u7/GTi2EOLU94vAD9z9r83sj4jugX6Ju/eGDy/vgn1WlXuLoAuYmrZdzdlNx1NlzGw0UfNysOZZMYtTX8zsc8AKYL67H89TbCEMVd8JwCVAu5ntI+pXXVfCA8Zxf5//3t1PuPs7wG6ixFCK4tT3VmAtgLtvAsYRLdBWjmL9fY9EuSeCLcA0M6s3s7FEg8Hr+pVZB9yUer0I+ImnRmZK0JD1TXWVfI8oCZRy/zEMUV93f9/dJ7t7nbvXEY2JzHf3rYUJN2txfp9fJJoQgJlNJuoq2pvXKHMnTn07gasBzOwiokRwIK9R5s864E9Ts4c+Dbzv7r/OxYnLumvI3U+a2VJgA9EMhCfdfaeZrQS2uvs64PtEzck9RC2BxYWLODsx6/tXQCXwo9SYeKe7zy9Y0FmIWd+yEbO+G4A/NrNdwH8B97n7ocJFPXIx63sPsNrM7ibqJrm5VL/ImdkzRF16k1NjHg8CYwDc/XGiMZB5wB7gGHBLzn52if6fiYhIjpR715CIiAxBiUBEJOGUCEREEk6JQEQk4ZQIREQSTolAZATMbL2ZnZ9h/9fN7N7U65vN7HfTju1Lze0XKSpKBCIj4O7zYizbcDPwu0OUESk4JQKRDMzs/5jZnanXq8zsJ6nXV5vZmvRv92a2IrVm/j8Dn0jtW0S0llOLmb1pZr+VOvUyM9tuZv9mZn+Q/5qJnE2JQCSz14CrUq9nAJVmNgaYBfysr5CZNRBdjX458AXgfwC4+3PAVqDR3T/l7r9J/ZOD7j4d+Bvg3nxURGQoSgQimW0DGsxsAnAc2ESUEK4iLRGktl9w92Pu/gFnr4XT39+lnb8upxGLjFBZrzUkMlLufiK1YuktwOvADqLF3P478Fb/4sM4dd9qr/+F/v6kSKhFIDKw14i6b14jagV8BXiz36JmrwELzey3Uq2H/5V27CjRUtgiRU2JQGRgPwOmAJvc/T+IbnWZ3i2Eu28HngXeBJ7vd/wHwOP9BotFio5WHxURSTi1CEREEk6JQEQk4ZQIREQSTolARCThlAhERBJOiUBEJOGUCEREEu7/AzVWxzWTcdfKAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "\n", "plt.scatter(X[y==0][:,0], X[y==0][:,1], color=\"blue\", label=\"X bug\")\n", "plt.scatter(X[y==1][:,0], X[y==1][:,1], color=\"red\", label=\"Z bug\")\n", "plt.xlabel(\"width\");plt.ylabel(\"length\"); plt.legend(); plt.grid();\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 1. Manually use a predictive model\n", "\n", "We give you a procedure somewhat calibrated so that, given a new bug, it produces a prediction. The procedure depends on two parameters $\\theta_0$ and $\\theta_1$. Given the width $w^{(i)}$ and height $h^{(i)}$ of bug number $i$, the prediction $\\hat{y}^{(i)} \\in \\{0, 1\\}$ is computed as follows:\n", "\n", "$$\\hat{y}^{(i)} = 0\\text{ if }w^{(i)}<\\theta_0\\text{ AND }h^{(i)}>\\theta_1;\\;\\;\\;\\;\\;\\text{otherwise }\\hat{y}^{(i)}=1$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can be considered as a **model template**, depending on two parameters.\n", "\n", "\n", "Complete **the following function** so that whenever given a `numpy` array `X` $\\in \\mathbb{R}^m \\times \\mathbb{R}^2$ containing the width and height of $m$ bugs, returns a vector $\\in \\mathbb{R}^m$ with the predictions of the $m$ bugs as described in the expression above. The parameter `t` $\\in \\mathbb{R}^2$ contains, in this order, $\\theta_0$ and $\\theta_1$\n", "\n", "Observe that your function must return a `numpy` vector of **integers** (not booleans). \n", "\n", "**CHALLENGE**: solve it with one single line of code\n", "\n", "**HINT**: use `.astype(int)` to convert a `numpy` array of booleans to integers." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "def predict(X, t):\n", " return ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check manually your code, your predictions with the following `t` must be \n", "\n", " [1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0]\n", " \n", "with an accuracy of 0.75" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "t = np.r_[.5,.3]\n", "y_hat = predict(X, t)\n", "y_hat" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "np.mean(y==y_hat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "observe the classification boundary that the model generates" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and with other `t` ... which is better?" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "t = np.r_[.5,.8]\n", "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();\n", "np.mean(y==predict(X,t))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "observe the prediction boundaries of other models. Change the `max_depth` of the decision tree to 2. Does it look familiar?" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\n", "mlutils.plot_2Ddata_with_boundary(LogisticRegression().fit(X,y).predict, X, y); plt.grid();" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "mlutils.plot_2Ddata_with_boundary(DecisionTreeClassifier(max_depth=5).fit(X,y).predict, X, y); plt.grid();" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from sklearn.svm import SVC\n", "mlutils.plot_2Ddata_with_boundary(SVC(gamma=50).fit(X,y).predict, X, y); plt.grid();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**submit your answer**" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "student.submit_task(globals(), task_id=\"task_01\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 2. Fit the model\n", "\n", "Given a set of annotated data $X$, $y$ and the **model template** of the previous exercise, complete the following function that returns $\\theta_0$ and $\\theta_1$ that produce the **best accuracy** on the given `X` and `y`. Consider only $\\theta_0$ and $\\theta_1$ with **one decimal number between 0 and 1**.\n", "\n", "**Hint**: use a brute force approach, consider all combinations of $\\theta_0$ and $\\theta_1 \\in$ [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]. Use [`np.linspace`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) and [`itertools.product`](https://docs.python.org/3/library/itertools.html#itertools.product)\n", "\n", "Your function must return an `numpy` array with two elements, the resulting $\\theta_0$ and $\\theta_1$" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [], "source": [ "import itertools\n", "\n", "\n", "def fit(X,y):\n", " def predict(X, t):\n", " return ...\n", "\n", " return ..." ] }, { "cell_type": "code", "execution_count": 203, "metadata": {}, "outputs": [], "source": [ "t = fit(X,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check your solution with the code below. The `t` returned by your function should produce an accuracy of 0.9 with the example data `X`, `y`. There might be several `t` producing the same accuracy, you just have to return any of those. " ] }, { "cell_type": "code", "execution_count": 204, "metadata": {}, "outputs": [], "source": [ "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,t), X, y); plt.grid();\n", "np.mean(y==predict(X,t))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "you can also use your model on different data. Execute the next cells several times to see the effect on different datasets." ] }, { "cell_type": "code", "execution_count": 229, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_blobs\n", "from sklearn.preprocessing import MinMaxScaler\n", "\n", "bX, by = make_blobs(100,n_features=2, centers=2)\n", "bX = MinMaxScaler(feature_range=(0.1,.9)).fit_transform(bX)" ] }, { "cell_type": "code", "execution_count": 230, "metadata": {}, "outputs": [], "source": [ "bt = fit(bX, by)" ] }, { "cell_type": "code", "execution_count": 231, "metadata": {}, "outputs": [], "source": [ "mlutils.plot_2Ddata_with_boundary(lambda X: predict(X,bt), bX, by); plt.grid();\n", "np.mean(by==predict(bX,bt))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**submit your answer**" ] }, { "cell_type": "code", "execution_count": 197, "metadata": {}, "outputs": [], "source": [ "student.submit_task(globals(), task_id=\"task_02\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 3: Make an `sklearn` compatible class with your model\n", "\n", "organize the previous methods in the following class structure. Bear in mind that:\n", "\n", "- the `fit` method now does not return `t`, which is now stored in an instance variable `self.t`\n", "- the `fit` method must now return `self`.\n", "- the `predict` method now does not accept `t` as argument, it must use the one stored in `self.t`" ] }, { "cell_type": "code", "execution_count": 294, "metadata": {}, "outputs": [], "source": [ "def SimpleModel():\n", " class _SimpleModel:\n", "\n", " def __init__(self):\n", " pass\n", "\n", " def fit(self, X, y):\n", "\n", " ....\n", " return self\n", "\n", " def predict(self, X):\n", " return ....\n", " \n", " return _SimpleModel()" ] }, { "cell_type": "code", "execution_count": 295, "metadata": {}, "outputs": [], "source": [ "m = SimpleModel()\n", "m.fit(X,y)\n", "m.predict(X)" ] }, { "cell_type": "code", "execution_count": 296, "metadata": {}, "outputs": [], "source": [ "mlutils.plot_2Ddata_with_boundary(m.predict, X, y); plt.grid();\n", "np.mean(y==m.predict(X))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check your model with different parametrizations of the `moons` dataset (more and less data points, more and less noise)" ] }, { "cell_type": "code", "execution_count": 293, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_moons\n", "\n", "mX, my = make_moons(100, noise=.1)\n", "m = SimpleModel()\n", "m.fit(mX,my)\n", "\n", "mlutils.plot_2Ddata_with_boundary(m.predict, mX, my); plt.grid();\n", "np.mean(my==m.predict(mX))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**submit your answer**" ] }, { "cell_type": "code", "execution_count": 313, "metadata": {}, "outputs": [], "source": [ "student.submit_task(globals(), task_id=\"task_03\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }